This report explores a dataset of NBA salaries for the past ten years and players performance statistics over regular NBA seasons from 2007/2008 to 2016/2017. The datasets were taken from kaggle.com (https://www.kaggle.com/drgilermo/nba-players-stats/home) and from data.world (https://data.world/datadavis/nba-salaries/). The source for both datasets is the website https://www.basketball-reference.com/. The final dataset was merged from the datasets described above and filtered so that we have statistics on players performance and their salaries going back 10 years.
Players performance statistics includes number of games (G), overall minutes played (MP), widely used advanced game statistics such as player efficiency rating (PER), true shooting (TS.), statistics on players perfomance in relation to the number of overall available opportunities (such as blocks, steals, rebounds percentatges) as well as simple averages per game such as points per game (PPG), assists per game (ASTPG), total rebounds per game (TRBPG) etc. Please refer to this glossary for the full list of basketball performance indicators: https://www.basketball-reference.com/about/glossary.html
I filtered the dataset so that only statistics of those players are considered who made more than 5 games for a given team in a given season in order to remove outliers.
Furthermore, salaries data was adjusted for inflation and will be presented in 2018 dollar terms.
## [1] 2254 73
## 'data.frame': 2254 obs. of 73 variables:
## $ Player : Factor w/ 4749 levels "","Aaron Brooks",..: 1495 1769 2275 2275 2275 1874 2318 2318 2318 2318 ...
## $ height : int 203 193 211 211 211 198 198 198 198 198 ...
## $ weight : int 102 92 108 108 108 98 96 96 96 96 ...
## $ born : int 1972 1973 1976 1976 1976 1974 1978 1978 1978 1978 ...
## $ Year : num 2012 2012 2013 2014 2015 ...
## $ Pos : Factor w/ 5 levels "C","PF","PG",..: 4 3 1 1 2 4 5 5 5 5 ...
## $ Age : int 39 38 36 37 38 37 33 34 35 36 ...
## $ Tm_curr_s : Factor w/ 68 levels "","AND","ATL",..: 46 18 6 7 7 3 30 30 30 30 ...
## $ Tm_next_s : Factor w/ 68 levels "","AND","ATL",..: 29 41 7 34 34 7 30 30 30 30 ...
## $ year_start : int 1995 1995 1996 1996 1996 1996 1997 1997 1997 1997 ...
## $ year_end : int 2013 2013 2016 2016 2016 2013 2016 2016 2016 2016 ...
## $ career_year : int 17 17 17 18 19 16 15 16 17 18 ...
## $ age_at_car_start: int 23 22 20 20 20 22 19 19 19 19 ...
## $ salary : num 1.98 3.09 12.44 12 8.5 ...
## $ adj : num 1.09 1.09 1.08 1.06 1.06 ...
## $ salary_infl : num 2.16 3.38 13.42 12.73 9.01 ...
## $ G : int 49 48 68 54 42 30 58 78 6 35 ...
## $ GS : int 46 48 68 54 42 0 58 78 6 35 ...
## $ MP : int 1378 1379 2022 1109 854 273 2232 3013 177 1207 ...
## $ PER : num 12.3 13.1 19.2 13.3 14.8 11.5 21.9 23 10.7 17.6 ...
## $ TS. : num 0.5 0.524 0.535 0.467 0.486 0.49 0.527 0.57 0.505 0.477 ...
## $ X3PAr : num 0.118 0.817 0.019 0.008 0.022 0.38 0.215 0.255 0.219 0.258 ...
## $ FTr : num 0.242 0.103 0.236 0.132 0.149 0.23 0.338 0.392 0.288 0.338 ...
## $ ORB. : num 2.4 1.2 4.5 6.5 6.1 2.6 3.5 2.5 1.2 2.3 ...
## $ DRB. : num 11.5 14.8 25.8 32.1 31.3 7.2 11.8 13.1 14.5 16.5 ...
## $ TRB. : num 7 8.1 15.5 19.3 18.7 4.9 7.8 7.9 7.8 9.1 ...
## $ AST. : num 12.1 28.4 14.4 12.4 13 8.3 23.7 29.7 34.6 29.9 ...
## $ STL. : num 1.5 3.1 2 2 2.5 1.9 1.6 1.8 1.9 2 ...
## $ BLK. : num 1.5 0.6 2.4 3 1.2 0.8 0.6 0.6 0.4 0.5 ...
## $ TOV. : num 11.5 24.2 10.5 15.5 13.3 10.6 11.7 13.3 29.2 13.5 ...
## $ USG. : num 18.6 12.7 24.5 18.9 18.1 21.2 35.7 31.9 28.7 34.9 ...
## $ OWS : num 0.6 0.9 1.8 -0.7 0.2 0 4.2 8.4 -0.5 -0.4 ...
## $ DWS : num 0.9 2.1 3.8 1.9 1.4 0.3 2 2.6 0.1 0.6 ...
## $ WS : num 1.6 3 5.6 1.2 1.6 0.3 6.2 10.9 -0.4 0.2 ...
## $ WS.48 : num 0.055 0.104 0.133 0.054 0.089 0.06 0.132 0.174 -0.097 0.006 ...
## $ OBPM : num -1.2 0.4 -1.2 -4.7 -2.8 -2.3 3.6 5.2 -4.7 1.6 ...
## $ DBPM : num 0.7 2.1 2.3 2.8 2.5 -2 -1.4 -0.5 -1.3 -1.3 ...
## $ BPM : num -0.5 2.5 1 -1.9 -0.4 -4.2 2.3 4.7 -5.9 0.3 ...
## $ VORP : num 0.5 1.5 1.5 0 0.3 -0.2 2.4 5.1 -0.2 0.7 ...
## $ FG : int 201 99 422 157 125 37 574 738 31 266 ...
## $ FGA : int 451 273 850 356 275 100 1336 1595 73 713 ...
## $ FG. : num 0.446 0.363 0.496 0.441 0.455 0.37 0.43 0.463 0.425 0.373 ...
## $ X3P : int 14 79 2 0 1 13 87 132 3 54 ...
## $ X3PA : int 53 223 16 3 6 38 287 407 16 184 ...
## $ X3P. : num 0.264 0.354 0.125 0 0.167 0.342 0.303 0.324 0.188 0.293 ...
## $ X2P : int 187 20 420 157 124 24 487 606 28 212 ...
## $ X2PA : int 398 50 834 353 269 62 1049 1188 57 529 ...
## $ X2P. : num 0.47 0.4 0.504 0.445 0.461 0.387 0.464 0.51 0.491 0.401 ...
## $ eFG. : num 0.461 0.507 0.498 0.441 0.456 0.435 0.462 0.504 0.445 0.411 ...
## $ FT : int 83 22 158 38 34 21 381 525 18 196 ...
## $ FTA : int 109 28 201 47 41 23 451 626 21 241 ...
## $ FT. : num 0.761 0.786 0.786 0.809 0.829 0.913 0.845 0.839 0.857 0.813 ...
## $ ORB : int 29 15 75 60 46 6 66 66 2 26 ...
## $ DRB : int 142 183 455 298 239 17 247 367 24 173 ...
## $ TRB : int 171 198 530 358 285 23 313 433 26 199 ...
## $ AST : int 107 264 159 82 69 14 264 469 38 197 ...
## $ STL : int 41 82 78 43 41 10 69 106 7 47 ...
## $ BLK : int 29 10 62 40 13 3 18 25 1 7 ...
## $ TOV : int 65 91 110 69 45 13 204 287 34 128 ...
## $ PF : int 88 83 154 123 96 12 105 173 9 65 ...
## $ PTS : int 499 299 1004 352 285 108 1616 2133 83 782 ...
## $ PPG : num 10.18 6.23 14.76 6.52 6.79 ...
## $ MPG : num 28.1 28.7 29.7 20.5 20.3 ...
## $ PFG : num 1.8 1.73 2.26 2.28 2.29 ...
## $ FTPG : num 1.694 0.458 2.324 0.704 0.81 ...
## $ X3PPG : num 0.2857 1.6458 0.0294 0 0.0238 ...
## $ X2PPG : num 3.816 0.417 6.176 2.907 2.952 ...
## $ TRBPG : num 3.49 4.12 7.79 6.63 6.79 ...
## $ ASTPG : num 2.18 5.5 2.34 1.52 1.64 ...
## $ STLPG : num 0.837 1.708 1.147 0.796 0.976 ...
## $ BLKPG : num 0.592 0.208 0.912 0.741 0.31 ...
## $ TOVPG : num 1.33 1.9 1.62 1.28 1.07 ...
## $ stayed : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## Player height weight born
## Lance Stephenson: 12 Min. :175.0 Min. : 68.00 Min. :1969
## Marcus Thornton : 10 1st Qu.:193.0 1st Qu.: 90.00 1st Qu.:1984
## Wayne Ellington : 10 Median :203.0 Median : 99.00 Median :1988
## Anderson Varejao: 9 Mean :200.8 Mean : 99.96 Mean :1987
## D.J. Augustin : 9 3rd Qu.:208.0 3rd Qu.:108.00 3rd Qu.:1990
## Jameer Nelson : 9 Max. :221.0 Max. :139.00 Max. :1997
## (Other) :2195
## Year Pos Age Tm_curr_s Tm_next_s
## Min. :2012 C :438 Min. :19.0 DEN : 95 NOH : 98
## 1st Qu.:2013 PF:439 1st Qu.:23.0 MEM : 93 ATL : 90
## Median :2015 PG:477 Median :26.0 TOR : 86 DEN : 90
## Mean :2015 SF:425 Mean :26.4 NOH : 84 MIL : 88
## 3rd Qu.:2016 SG:475 3rd Qu.:29.0 SAC : 83 BKN : 85
## Max. :2017 Max. :40.0 OKC : 82 WAS : 85
## (Other):1731 (Other):1718
## year_start year_end career_year age_at_car_start
## Min. :1995 Min. :2013 Min. : 0.000 Min. :19.00
## 1st Qu.:2007 1st Qu.:2017 1st Qu.: 1.000 1st Qu.:21.00
## Median :2010 Median :2018 Median : 4.000 Median :22.00
## Mean :2010 Mean :2017 Mean : 4.874 Mean :22.69
## 3rd Qu.:2013 3rd Qu.:2018 3rd Qu.: 8.000 3rd Qu.:24.00
## Max. :2017 Max. :2018 Max. :19.000 Max. :37.00
##
## salary adj salary_infl G
## Min. : 0.00882 Min. :1.025 Min. : 0.00935 Min. : 6.00
## 1st Qu.: 1.41399 1st Qu.:1.047 1st Qu.: 1.50817 1st Qu.:37.00
## Median : 3.33333 Median :1.060 Median : 3.53553 Median :60.00
## Mean : 5.68117 Mean :1.060 Mean : 5.99965 Mean :54.26
## 3rd Qu.: 8.00000 3rd Qu.:1.078 3rd Qu.: 8.45824 3rd Qu.:74.00
## Max. :34.68255 Max. :1.094 Max. :35.54961 Max. :82.00
##
## GS MP PER TS.
## Min. : 0.00 Min. : 23.0 Min. :-16.6 Min. :0.1130
## 1st Qu.: 2.00 1st Qu.: 608.2 1st Qu.: 10.8 1st Qu.:0.4950
## Median :15.00 Median :1243.0 Median : 13.6 Median :0.5310
## Mean :27.21 Mean :1289.5 Mean : 13.8 Mean :0.5269
## 3rd Qu.:54.00 3rd Qu.:1937.0 3rd Qu.: 16.6 3rd Qu.:0.5630
## Max. :82.00 Max. :3167.0 Max. : 31.6 Max. :0.7680
##
## X3PAr FTr ORB. DRB.
## Min. :0.00000 Min. :0.0000 Min. : 0.000 Min. : 2.60
## 1st Qu.:0.03925 1st Qu.:0.1730 1st Qu.: 2.000 1st Qu.: 9.90
## Median :0.27100 Median :0.2470 Median : 3.700 Median :13.65
## Mean :0.26742 Mean :0.2712 Mean : 5.222 Mean :14.82
## 3rd Qu.:0.42375 3rd Qu.:0.3390 3rd Qu.: 8.000 3rd Qu.:18.90
## Max. :0.94300 Max. :1.2190 Max. :24.800 Max. :45.10
##
## TRB. AST. STL. BLK.
## Min. : 1.90 Min. : 0.00 Min. :0.000 Min. : 0.000
## 1st Qu.: 6.20 1st Qu.: 6.80 1st Qu.:1.200 1st Qu.: 0.500
## Median : 8.80 Median :10.45 Median :1.500 Median : 1.100
## Mean :10.02 Mean :13.75 Mean :1.603 Mean : 1.645
## 3rd Qu.:13.30 3rd Qu.:18.48 3rd Qu.:2.000 3rd Qu.: 2.200
## Max. :27.60 Max. :57.30 Max. :5.100 Max. :15.100
##
## TOV. USG. OWS DWS
## Min. : 0.00 Min. : 3.70 Min. :-3.300 Min. :-0.200
## 1st Qu.:10.50 1st Qu.:15.30 1st Qu.: 0.100 1st Qu.: 0.400
## Median :12.90 Median :18.70 Median : 0.900 Median : 1.000
## Mean :13.56 Mean :19.06 Mean : 1.494 Mean : 1.313
## 3rd Qu.:15.90 3rd Qu.:22.20 3rd Qu.: 2.300 3rd Qu.: 1.900
## Max. :62.50 Max. :41.70 Max. :14.800 Max. : 6.400
##
## WS WS.48 OBPM DBPM
## Min. :-2.100 Min. :-0.47700 Min. :-16.7000 Min. :-8.5000
## 1st Qu.: 0.700 1st Qu.: 0.05000 1st Qu.: -2.3000 1st Qu.:-1.5000
## Median : 2.000 Median : 0.08900 Median : -0.7000 Median :-0.3000
## Mean : 2.806 Mean : 0.08739 Mean : -0.8072 Mean :-0.2021
## 3rd Qu.: 4.100 3rd Qu.: 0.12575 3rd Qu.: 0.7000 3rd Qu.: 1.0000
## Max. :19.300 Max. : 0.32500 Max. : 12.4000 Max. : 7.5000
##
## BPM VORP FG FGA
## Min. :-20.700 Min. :-1.6000 Min. : 1.0 Min. : 3.0
## 1st Qu.: -2.900 1st Qu.:-0.1000 1st Qu.: 75.0 1st Qu.: 172.2
## Median : -1.000 Median : 0.2000 Median :172.0 Median : 374.0
## Mean : -1.009 Mean : 0.6984 Mean :203.5 Mean : 447.3
## 3rd Qu.: 0.900 3rd Qu.: 1.1000 3rd Qu.:294.0 3rd Qu.: 643.8
## Max. : 15.600 Max. :12.4000 Max. :849.0 Max. :1941.0
##
## FG. X3P X3PA X3P.
## Min. :0.0630 Min. : 0.00 Min. : 0.0 Min. :0.0000
## 1st Qu.:0.4070 1st Qu.: 2.00 1st Qu.: 8.0 1st Qu.:0.2527
## Median :0.4400 Median : 24.00 Median : 73.0 Median :0.3330
## Mean :0.4468 Mean : 41.96 Mean :117.6 Mean :0.2959
## 3rd Qu.:0.4828 3rd Qu.: 67.00 3rd Qu.:191.0 3rd Qu.:0.3770
## Max. :0.7360 Max. :402.00 Max. :886.0 Max. :1.0000
## NA's :198
## X2P X2PA X2P. eFG.
## Min. : 0.0 Min. : 2.0 Min. :0.0000 Min. :0.0630
## 1st Qu.: 53.0 1st Qu.: 114.2 1st Qu.:0.4420 1st Qu.:0.4580
## Median :124.5 Median : 260.0 Median :0.4770 Median :0.4930
## Mean :161.5 Mean : 329.8 Mean :0.4784 Mean :0.4927
## 3rd Qu.:230.8 3rd Qu.: 469.0 3rd Qu.:0.5170 3rd Qu.:0.5298
## Max. :706.0 Max. :1421.0 Max. :1.0000 Max. :0.8130
##
## FT FTA FT. ORB
## Min. : 0.00 Min. : 0.0 Min. :0.0000 Min. : 0.00
## 1st Qu.: 27.00 1st Qu.: 38.0 1st Qu.:0.6740 1st Qu.: 17.00
## Median : 64.00 Median : 88.0 Median :0.7625 Median : 37.00
## Mean : 94.44 Mean :124.5 Mean :0.7393 Mean : 58.42
## 3rd Qu.:129.00 3rd Qu.:174.0 3rd Qu.:0.8240 3rd Qu.: 77.75
## Max. :746.00 Max. :881.0 Max. :1.0000 Max. :440.00
## NA's :16
## DRB TRB AST STL
## Min. : 1.0 Min. : 2.00 Min. : 0.0 Min. : 0.00
## 1st Qu.: 70.0 1st Qu.: 91.25 1st Qu.: 33.0 1st Qu.: 16.00
## Median :144.5 Median : 184.00 Median : 76.0 Median : 34.00
## Mean :173.2 Mean : 231.66 Mean :120.9 Mean : 41.53
## 3rd Qu.:234.8 3rd Qu.: 313.00 3rd Qu.:159.8 3rd Qu.: 58.75
## Max. :829.0 Max. :1226.00 Max. :906.0 Max. :191.00
##
## BLK TOV PF PTS
## Min. : 0.00 Min. : 0.0 Min. : 1.0 Min. : 2.0
## 1st Qu.: 6.00 1st Qu.: 30.0 1st Qu.: 55.0 1st Qu.: 196.0
## Median : 16.00 Median : 60.0 Median :104.0 Median : 447.5
## Mean : 26.28 Mean : 74.8 Mean :106.9 Mean : 543.4
## 3rd Qu.: 34.00 3rd Qu.:105.0 3rd Qu.:152.0 3rd Qu.: 781.8
## Max. :269.00 Max. :464.0 Max. :301.0 Max. :2593.0
##
## PPG MPG PFG FTPG
## Min. : 0.250 Min. : 2.875 Min. :0.08333 Min. :0.0000
## 1st Qu.: 4.873 1st Qu.:15.507 1st Qu.:1.38117 1st Qu.:0.6307
## Median : 7.971 Median :22.076 Median :1.87500 Median :1.1752
## Mean : 9.118 Mean :22.078 Mean :1.85805 Mean :1.5683
## 3rd Qu.:12.372 3rd Qu.:29.285 3rd Qu.:2.33333 3rd Qu.:2.0479
## Max. :32.012 Max. :39.426 Max. :4.41176 Max. :9.2099
##
## X3PPG X2PPG TRBPG ASTPG
## Min. :0.00000 Min. :0.000 Min. : 0.1818 Min. : 0.0000
## 1st Qu.:0.04013 1st Qu.:1.293 1st Qu.: 2.1434 1st Qu.: 0.7867
## Median :0.55000 Median :2.266 Median : 3.2917 Median : 1.4136
## Mean :0.71006 Mean :2.710 Mean : 3.9135 Mean : 2.0656
## 3rd Qu.:1.16610 3rd Qu.:3.754 3rd Qu.: 5.1031 3rd Qu.: 2.6974
## Max. :5.08861 Max. :9.406 Max. :14.9512 Max. :11.6528
##
## STLPG BLKPG TOVPG stayed
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Mode :logical
## 1st Qu.:0.4029 1st Qu.:0.1392 1st Qu.:0.7143 FALSE:984
## Median :0.6281 Median :0.2945 Median :1.1163 TRUE :1270
## Mean :0.7090 Mean :0.4404 Mean :1.2857
## 3rd Qu.:0.9317 3rd Qu.:0.5625 3rd Qu.:1.6798
## Max. :2.5333 Max. :3.6849 Max. :5.7284
##
The dataset consists of more than 2,000 observation of 73 variables. We will not be exploring all of them however. Some of them were calculated based on others, e.g. field goal percentage (FG.) is calculated using field goals (FG) and field goals attemps (FGA). We will be concentrating on commonly used easy, to understand basketball statistics and additionally have a closer look at some advanced performance statistics. We will try to avoid using variables which are a product of other variables. Variables which we will be exploring are salary adjusted for inflation, career year, player’s age, number of games played for the team in a given season, various statistics per game: points, minutes played, two and three point field goals, free throws, assists, steals, rebounds, turnovers, blocks. Furthermore, true shooting percentage which represents the measure of shooting efficiency and is calculated as a ratio of goals including free throws to attempts made will be explored. Additionally, we will have a look at player effeciency rating (PER). Finally, we will look at whether players stayed or leaved the team during or after the season. Since the performance statistics mentioned above are influenced by the position the player, we may increase our understanding of data by looking at statistics for different positions separately later on in the course of the analysis.
Transforming the salary using the square root function allows a more closer look at the data in the long tail and appears to result in a bimodal distribution with peaks at around 0.1 million USD and just above 1 million USD. Let’s have a look at age and career year of the players.
Players age seems to peak at 23-24 years and starts to decrease from there. There is only a small portion of players aged over 35.
Also most players also appear to be in their early years of NBA career with the largest number of players in their first career year. Number of players in later career years gradually decreases. Only a small portion of players achieve a long career over 15 years. Let’s have a look at performance indicators distributions (average statistics per game) next.
As we can see from the plots of points related statistics, transforming the scale by the square root and the log functions bring the best view of long tails. Using the square root function with the variables points per game, 2 points per game per game delivers a bell shape curve which looks like normal distribution. For the variable 3 points per game we have a surprisingly large number of players who haven’t scored a single 3 points goal. For the variable free throws per game the distribution is right skewed with some outliers scoring much more free throws than the average.
By plotting further important basketball statistics such as assists, steals, total rebounds and blocks we see a similar picture as for the scoring statistics: transforming the scale using the square root and the log function helps to understand longer right tales. With blocks we see a similar story as with 3 point goals: there is a large number of players who blocked no shots.
Lastly, there are plots of negative performance indicators, personal fouls and turnovers. Personal fouls per game distribution already has a bell curve shape without transforming the scale. Turnovers which indicate loosing the ball are distributed in a bell curve shape after transforming the scale by using a square root function.
As for effective field goal and true shooting they both look like a bell curve with high kurtosis with the most common values about 0.5 for effective field goal and slightly above 0.5 for true shooting. This makes sense since effective field goal percentage takes into account shots accuracy using shots from the field only (both 2 and 3 points shots) and true shooting additionally considers shots accuracy of free throws.
Finally, from the player efficiency rating distribution plot we can conclude that the most common PER level lies between 10 and 15 and only few player have achieved a PER above 20. PER is widely used indicator of player’s performance. PER is calculated using a specially developed formula taking into account a wide range of performance indicators of an individuall player adjusted by specially developed factors. Please refer to https://www.basketball-reference.com/about/per.html for further details.
Let’s have a look at salary distribution of player with PER below 15 as well as above 20 together with the salary histogram of all players.
The salaries histogram for lower performing players assembles the salaries histogramm for all players but has a less fatter tail. The salaries of players with the best performance reach along the range from just below 1 million USD up to more than 30 million USD: there are still a high number of best perfoming players earning comparably low salaries while only few reach a salary above 30 million USD.
## salary_infl
## Min. : 0.00935
## 1st Qu.: 1.40474
## Median : 3.23313
## Mean : 5.05729
## 3rd Qu.: 6.93100
## Max. :32.05096
## salary_infl
## Min. : 0.814
## 1st Qu.: 9.742
## Median :16.681
## Mean :15.895
## 3rd Qu.:21.288
## Max. :35.550
We can see that ranges of salaries for players with PER above 20 and below or at 20 differ a lot. Also the mean salaries of players with PER above 20 amounts to approx. 15.9 million USD which is much higher than the mean salary of players with PER below 10 which is only about 5.1 million USD.
##
## FALSE TRUE
## 984 1270
Overall, over the past ten years 984 players changed their teams during or after the season, 1270 stayed with the same team as in the previous season. I wonder if changing teams is linked to player’s salary.
The dataset contains data on the past 10 NBA seasons including different performance indicators and players data. Overall, there are 2254 ovservations. The dataset is structured in such a way that performance statistics of a given season corresponds to a player’s salary in the next season since salaries in the NBA are determined before the season start. The mean salary (adjusted for inflation is 6 million USD and the median salary is 3.5 million USD. Mean players age is 26 years and mean career length among active players over the last ten years is about 4.8 years. Performance statistics such points from the field, free throws, blocks, steals, assists, turnovers, rebounds all have long right tail resulting in a small number of players outscoring the average.
The main features in the data set are players salaries along with different performance indicators. Here salary level differs dramatically starting from less than one million USD up to over 30 millions USD. I would like to determine how players performance is linked to their salary level.
Other features which may be linked to the salary level are player’s age and experience in the NBA as well as whether the player stays with the same team as in the previous season.
I created variables indicating per game performance since in the starting datasets only overall performance of the player for a team in a given season was given along with the number of games. I also created a variable named ‘stayed’ indicating whether the player stayed with the same team as in the previous season.
I adjusted the data by matching salaries data with games statistics and filtered the data so that only observation with more than 5 games per season are considered in order to remove outliers. Furthermore, I adjusted salary data for inflation since the players performance indicators are comparable over the years (at least given that the rules of the game are not changed significantly which is the case for the time frame being explored).
## salary_infl Age career_year PPG MPG
## salary_infl 1.0000000 0.18206370 0.31623594 0.68835303 0.6131324
## Age 0.1820637 1.00000000 0.89726938 0.04890164 0.1095504
## career_year 0.3162359 0.89726938 1.00000000 0.20451996 0.2526185
## PPG 0.6883530 0.04890164 0.20451996 1.00000000 0.8713495
## MPG 0.6131324 0.10955042 0.25261846 0.87134947 1.0000000
## FTPG 0.6419619 0.01117289 0.15108960 0.89564033 0.7233438
## ASTPG 0.4121815 0.11532223 0.18895710 0.60711900 0.6315992
## STLPG 0.4324244 0.03115063 0.14064493 0.64255430 0.7304698
## TRBPG 0.5299388 0.03916293 0.16421692 0.53220711 0.5981819
## BLKPG 0.3208842 -0.03299608 0.04607904 0.25307385 0.3064488
## TOVPG 0.5575525 0.02648850 0.15795630 0.81278307 0.7763511
## PFG 0.3654946 0.02644604 0.11982103 0.51518656 0.6556539
## eFG. 0.2226861 0.14313091 0.14235545 0.23875585 0.2290123
## TS. 0.2966609 0.14541488 0.15425917 0.37189518 0.3195772
## PER 0.5610399 0.05801093 0.16211148 0.74143045 0.5618378
## FTPG ASTPG STLPG TRBPG BLKPG
## salary_infl 0.64196190 0.41218148 0.43242435 0.52993882 0.32088421
## Age 0.01117289 0.11532223 0.03115063 0.03916293 -0.03299608
## career_year 0.15108960 0.18895710 0.14064493 0.16421692 0.04607904
## PPG 0.89564033 0.60711900 0.64255430 0.53220711 0.25307385
## MPG 0.72334384 0.63159924 0.73046978 0.59818192 0.30644880
## FTPG 1.00000000 0.55802405 0.57126742 0.49416844 0.24719732
## ASTPG 0.55802405 1.00000000 0.67234591 0.11325162 -0.09938710
## STLPG 0.57126742 0.67234591 1.00000000 0.33391745 0.09576598
## TRBPG 0.49416844 0.11325162 0.33391745 1.00000000 0.70108478
## BLKPG 0.24719732 -0.09938710 0.09576598 0.70108478 1.00000000
## TOVPG 0.77739879 0.81386357 0.69376364 0.44329156 0.19078068
## PFG 0.44025464 0.27305813 0.48509979 0.67166134 0.52995119
## eFG. 0.11986096 -0.02672048 0.08103493 0.30032076 0.28896259
## TS. 0.32669282 0.07433784 0.16265372 0.31709519 0.26998314
## PER 0.71243871 0.41025737 0.46259621 0.60926410 0.44287173
## TOVPG PFG eFG. TS. PER
## salary_infl 0.5575525 0.36549462 0.22268608 0.29666087 0.56103992
## Age 0.0264885 0.02644604 0.14313091 0.14541488 0.05801093
## career_year 0.1579563 0.11982103 0.14235545 0.15425917 0.16211148
## PPG 0.8127831 0.51518656 0.23875585 0.37189518 0.74143045
## MPG 0.7763511 0.65565391 0.22901235 0.31957717 0.56183783
## FTPG 0.7773988 0.44025464 0.11986096 0.32669282 0.71243871
## ASTPG 0.8138636 0.27305813 -0.02672048 0.07433784 0.41025737
## STLPG 0.6937636 0.48509979 0.08103493 0.16265372 0.46259621
## TRBPG 0.4432916 0.67166134 0.30032076 0.31709519 0.60926410
## BLKPG 0.1907807 0.52995119 0.28896259 0.26998314 0.44287173
## TOVPG 1.0000000 0.52394556 0.05286990 0.16997823 0.55411624
## PFG 0.5239456 1.00000000 0.26106175 0.29412372 0.41774843
## eFG. 0.0528699 0.26106175 1.00000000 0.94160039 0.52632818
## TS. 0.1699782 0.29412372 0.94160039 1.00000000 0.64303545
## PER 0.5541162 0.41774843 0.52632818 0.64303545 1.00000000
Salary is most correlated with points per game, minutes played, total rebounds per game and PER. Also minutes played per game correlated strongly with many ‘per game’ - indicators which makes sense since players have more chances to score, assist etc. if they play more. At the same time number of blocks per game and shooting accuracy represented by effective field goal and true shooting percentage are not strongly related to the number of minutes played. Number of personal fouls per game is correlated most with rebounds which makes also sense since rebounding fouls are common. Turnovers are also strongly correlated with points per game which is a bit surprising to me and I don’t have a good explanation for that. It might be that players having the ball most of the time make many points and at the same time have a higher chance of loosing the ball. Age and career year show only poor correlations with salary as well as with players perfomance indicators, so experience surely helps in many aspects of the game but we cannot yet see it from the variables under our exploration here. Lets have a closer look at some per game indicators using a plot matrix.
I tried to build a linear regression model from some of the variables for predicting salaries which will be introduced in the multvariate analysis section. I didn’t use variables which obviously influence each other, e.g. I didn’t use the variable minutes per game (MPG), since it influences other variables such as points per game (PPG) or 2 points per game (X2PPG). This is also the reason why I didn’t use player efficiency rating (PER) since it is calculated using other variables I used for my modell, although it has a high correlation with salary as can be seen from the correlation matrix above.
Since the plots presented above just use a linear scale I wonder how the relationship between salary and performance indicators would look like when ajdusting the scale using log or square root function, since as we’ve seen in the section on univariate plots some variables are better explored by adjusting the scale.
From the scoring indicators, 2 point goals and free point goals are mostly related to the salary. 3 point goals are not strongly related to the salary: there are many players with relatively high salary and relatively low 3 points scoring. Remember, from the histogram plot of 3 point goals we learned that there are many players who didn’t score a single 3 pointer per game.
Among other performance indicators total rebounds have the strongest relationship with the salary.
Interestingly, negative performance represented by the number of turnovers per game and personal fouls per game also positively related to the salary.
Plotting the shooting accuracy variables effective field goal percentage and true shooting reveals that there are very few players whos shooting accuracy is low but salary is high (the upper left corners of the both plots have only few points). At the same time there are plenty players whos shooting accuracy is relatively high but the salary is low.
Next, let us look at categorical variables such as position and staying with the team and salary distribution across them. Additionally let’s look at salary vs. age and career year using box plots which might deliver a better view than points plot.
It looks like staying with the same team is connected to a bigger salary measured by mean salary and 3rd quartile where both are higher for those who stay.
Also in the first three career years salaries are smaller than in later years.
Differences in salaries between positions also exist. Now let us look into the position variable in more detail. Who scores most points?
It looks like point guards and shooting guards score most on average, which makes sense, although the difference to other positions is not dramatically large. Who does most rebounding?
Do different positions require different physical characterics like height and weight?
Big guys like center and power forwards rebound most which is also no surpise. Finally, do experienced players score more or draw more fouls?
Maybe the reason for younger players to score less is having less time to score?
To see if it’s true let us proof whether this difference still exists when points per minute instead of points per game and plot it by career year.
Here we saw that players in their first and second career years seem to have a more cautious game drawing less fouls and thus having less free throws per game as well as less points per game. One part of it can be explained by the fact that younger players spent less minutes on the field indicated by minutes per game by career year vs. minutes plot. Plotting points per minute for different career years reveals that the difference in points in first two years is not that big although it exists. Finally, for players who have passed their 10th career year it is harder to make average judgements since there are fewer data points for those as we saw by plotting the career year variable in the univariate plots section above.
I wonder how performance indicators and their relation to the salary variable explored above for all players look differently for different positions. Let us have a closer look at it later in the multivariate analysis chapter.
I observed some interesting relationships between salary level and game performance indicators. In most cases relationships observed are not linear. Relationships between salary and players experience expressed by age and career year varibale were not as strong as I expected.
To get a better understanding about the data I looked at how two important variables, points per game and number of free throws, are different for different career years by creating box plots. Both plots revealed that in the beginning of the career players have lower number of overall points and free throws per game. Then I created a plot of number of minutes played for different career years which revealed to me a similar pattern: players with less experience in the NBA spent less time in the game which may explain their lower scoring. Plotting points per minute reveals that the difference in scoring for younger players is not that big although it still exists.
The strongest relationship observed was between salary and points per game with the correlation of approximately 0.69. Number of free throws per game and total rebounds per game had second and third largest correlations with salary from all the variables explored so far.
There are less center players who earn under 5 million USD than other players. There are more shooting guards than other players who earn under 2.5 million USD and there are less shooting guard players who earn between 12.5 and 20 million than other positions.
Among those who earn above 2.5 million USD there are less players who change teams than there are players who stay.
From the plots of the scoring indicators variables by position we can conclude that relationships between 2 point goals and free throws definetly there independet from players positions. As for the 3 point goals variable where we saw before that there is no such strong relationship, we now know this is mainly because of center players. For point guards the number of 3 point goals is somewhat related to the salary as shown in the plot.
Now we see that the variables rebounds per game and blocks per game are those which are related to center players salary. At the same time blocks are not important for shooting guards. Even if the relationship exists for multiple players positions such as in case of assists for point guards, shooting guards and small forwards, the relationships is differently important for different positions.
Again, with negative performance indicators, personal fouls per game and turnovers per game across positions we see the same positive relationship with the salary as for all players together.
Lastly, for true shooting we see almost no relationship with salary for center players and some relationship for point guards and shooting guards.
Plots of salary vs. game performance indicators by position divided into two groups (those who stayed and those who left) reveal different picture for different positions and stayed / left group. Therefore not all indicators are best suited to be used widely across different positions to evaluate players performance. In many cases in the above plots the smoothing line for those who change teams is below for those who stay which means that the variable stayed could be also used for the salaries forecasting considerations.
I tried to build a linear regression for predicting salary based on average per game performance indicators described above. I also included the categorical variable ‘stayed’ into the modell as well as career year variable. Let us look at how the modell works for all players as well as for different positions separately by subseting the data. Firstly, I subset the data by setting game performance indicators which will be used in the regression by setting them above zero since some variables will be adjusted using the log function.
## [1] 1699 74
## Player height weight born
## Lance Stephenson: 11 Min. :175 Min. : 68.00 Min. :1969
## Wayne Ellington : 10 1st Qu.:193 1st Qu.: 90.00 1st Qu.:1984
## Michael Beasley : 9 Median :201 Median : 97.00 Median :1988
## Andre Miller : 8 Mean :199 Mean : 97.18 Mean :1987
## Arron Afflalo : 8 3rd Qu.:206 3rd Qu.:104.00 3rd Qu.:1991
## Brandon Jennings: 8 Max. :221 Max. :131.00 Max. :1997
## (Other) :1645
## Year Pos Age Tm_curr_s Tm_next_s
## Min. :2012 C :135 Min. :19.00 MEM : 74 NOH : 74
## 1st Qu.:2013 PF:305 1st Qu.:23.00 DEN : 68 ATL : 67
## Median :2015 PG:419 Median :26.00 SAC : 66 MEM : 67
## Mean :2015 SF:398 Mean :26.48 MIN : 63 PHO : 67
## 3rd Qu.:2016 SG:442 3rd Qu.:29.00 TOR : 63 SAC : 66
## Max. :2017 Max. :40.00 NOH : 60 MIN : 63
## (Other):1305 (Other):1295
## year_start year_end career_year age_at_car_start
## Min. :1995 Min. :2013 Min. : 0.000 Min. :19.00
## 1st Qu.:2006 1st Qu.:2017 1st Qu.: 2.000 1st Qu.:21.00
## Median :2010 Median :2018 Median : 4.000 Median :22.00
## Mean :2010 Mean :2017 Mean : 5.075 Mean :22.64
## 3rd Qu.:2013 3rd Qu.:2018 3rd Qu.: 8.000 3rd Qu.:23.00
## Max. :2017 Max. :2018 Max. :19.000 Max. :37.00
##
## salary adj salary_infl G
## Min. : 0.01991 Min. :1.025 Min. : 0.02113 Min. : 6.00
## 1st Qu.: 1.50000 1st Qu.:1.047 1st Qu.: 1.58933 1st Qu.:42.00
## Median : 3.65750 Median :1.060 Median : 3.88130 Median :62.00
## Mean : 6.08240 Mean :1.058 Mean : 6.41847 Mean :57.24
## 3rd Qu.: 8.33892 3rd Qu.:1.078 3rd Qu.: 8.83999 3rd Qu.:75.00
## Max. :34.68255 Max. :1.094 Max. :35.54961 Max. :82.00
##
## GS MP PER TS.
## Min. : 0.00 Min. : 31.0 Min. : 0.80 Min. :0.2550
## 1st Qu.: 3.00 1st Qu.: 785.5 1st Qu.:11.00 1st Qu.:0.4990
## Median :19.00 Median :1398.0 Median :13.50 Median :0.5310
## Mean :29.51 Mean :1418.3 Mean :13.96 Mean :0.5288
## 3rd Qu.:57.00 3rd Qu.:2040.5 3rd Qu.:16.45 3rd Qu.:0.5600
## Max. :82.00 Max. :3167.0 Max. :31.60 Max. :0.6990
##
## X3PAr FTr ORB. DRB.
## Min. :0.0010 Min. :0.0240 Min. : 0.000 Min. : 4.10
## 1st Qu.:0.1980 1st Qu.:0.1640 1st Qu.: 1.800 1st Qu.: 9.50
## Median :0.3280 Median :0.2330 Median : 2.900 Median :12.30
## Mean :0.3261 Mean :0.2465 Mean : 3.958 Mean :13.57
## 3rd Qu.:0.4510 3rd Qu.:0.3105 3rd Qu.: 5.200 3rd Qu.:16.90
## Max. :0.8730 Max. :0.8870 Max. :17.500 Max. :36.30
##
## TRB. AST. STL. BLK.
## Min. : 2.200 Min. : 0.90 Min. :0.100 Min. :0.100
## 1st Qu.: 5.900 1st Qu.: 8.05 1st Qu.:1.200 1st Qu.:0.500
## Median : 7.700 Median :12.20 Median :1.600 Median :0.900
## Mean : 8.764 Mean :15.45 Mean :1.679 Mean :1.274
## 3rd Qu.:11.000 3rd Qu.:20.85 3rd Qu.:2.100 3rd Qu.:1.600
## Max. :25.300 Max. :57.30 Max. :5.100 Max. :9.800
##
## TOV. USG. OWS DWS
## Min. : 1.40 Min. : 8.90 Min. :-3.300 Min. :-0.200
## 1st Qu.:10.30 1st Qu.:15.90 1st Qu.: 0.200 1st Qu.: 0.500
## Median :12.60 Median :19.40 Median : 1.000 Median : 1.100
## Mean :13.02 Mean :19.70 Mean : 1.637 Mean : 1.362
## 3rd Qu.:15.20 3rd Qu.:22.85 3rd Qu.: 2.400 3rd Qu.: 2.000
## Max. :34.50 Max. :41.70 Max. :14.800 Max. : 6.400
##
## WS WS.48 OBPM DBPM
## Min. :-2.100 Min. :-0.11600 Min. :-7.7000 Min. :-4.8000
## 1st Qu.: 0.900 1st Qu.: 0.05100 1st Qu.:-1.8000 1st Qu.:-1.6000
## Median : 2.200 Median : 0.08700 Median :-0.4000 Median :-0.5000
## Mean : 2.998 Mean : 0.08771 Mean :-0.2609 Mean :-0.4287
## 3rd Qu.: 4.200 3rd Qu.: 0.12100 3rd Qu.: 0.9500 3rd Qu.: 0.6000
## Max. :19.300 Max. : 0.32200 Max. :12.4000 Max. : 5.6000
##
## BPM VORP FG FGA
## Min. :-9.2000 Min. :-1.6000 Min. : 4.0 Min. : 10.0
## 1st Qu.:-2.6000 1st Qu.:-0.1000 1st Qu.:101.0 1st Qu.: 239.0
## Median :-0.9000 Median : 0.3000 Median :195.0 Median : 444.0
## Mean :-0.6893 Mean : 0.7935 Mean :227.4 Mean : 510.3
## 3rd Qu.: 1.0000 3rd Qu.: 1.3000 3rd Qu.:320.5 3rd Qu.: 719.0
## Max. :15.6000 Max. :12.4000 Max. :849.0 Max. :1941.0
##
## FG. X3P X3PA X3P.
## Min. :0.2040 Min. : 1.00 Min. : 1.0 Min. :0.0420
## 1st Qu.:0.4040 1st Qu.: 15.00 1st Qu.: 49.0 1st Qu.:0.2960
## Median :0.4330 Median : 43.00 Median :124.0 Median :0.3460
## Mean :0.4355 Mean : 55.11 Mean :153.9 Mean :0.3398
## 3rd Qu.:0.4630 3rd Qu.: 83.00 3rd Qu.:230.0 3rd Qu.:0.3840
## Max. :0.7100 Max. :402.00 Max. :886.0 Max. :1.0000
##
## X2P X2PA X2P. eFG.
## Min. : 2.0 Min. : 5.0 Min. :0.1670 Min. :0.2310
## 1st Qu.: 64.0 1st Qu.: 139.0 1st Qu.:0.4410 1st Qu.:0.4610
## Median :136.0 Median : 284.0 Median :0.4730 Median :0.4920
## Mean :172.2 Mean : 356.3 Mean :0.4729 Mean :0.4926
## 3rd Qu.:242.0 3rd Qu.: 498.5 3rd Qu.:0.5070 3rd Qu.:0.5240
## Max. :706.0 Max. :1421.0 Max. :0.7130 Max. :0.7110
##
## FT FTA FT. ORB
## Min. : 1.0 Min. : 2.0 Min. :0.250 Min. : 0.00
## 1st Qu.: 32.0 1st Qu.: 45.0 1st Qu.:0.713 1st Qu.: 18.00
## Median : 74.0 Median : 95.0 Median :0.782 Median : 33.00
## Mean :105.9 Mean :135.4 Mean :0.767 Mean : 50.12
## 3rd Qu.:143.0 3rd Qu.:186.5 3rd Qu.:0.833 3rd Qu.: 63.50
## Max. :746.0 Max. :881.0 Max. :1.000 Max. :397.00
##
## DRB TRB AST STL
## Min. : 2.0 Min. : 2.0 Min. : 1.0 Min. : 1.00
## 1st Qu.: 80.0 1st Qu.: 102.0 1st Qu.: 49.0 1st Qu.: 22.00
## Median :151.0 Median : 187.0 Median : 99.0 Median : 40.00
## Mean :174.7 Mean : 224.9 Mean :145.8 Mean : 47.53
## 3rd Qu.:229.0 3rd Qu.: 297.5 3rd Qu.:193.0 3rd Qu.: 65.00
## Max. :829.0 Max. :1226.0 Max. :906.0 Max. :191.00
##
## BLK TOV PF PTS
## Min. : 1.00 Min. : 1.00 Min. : 1.0 Min. : 13.0
## 1st Qu.: 6.00 1st Qu.: 38.00 1st Qu.: 64.0 1st Qu.: 270.0
## Median : 14.00 Median : 68.00 Median :108.0 Median : 523.0
## Mean : 23.11 Mean : 83.45 Mean :110.8 Mean : 615.7
## 3rd Qu.: 29.00 3rd Qu.:115.50 3rd Qu.:152.0 3rd Qu.: 857.0
## Max. :242.00 Max. :464.00 Max. :301.0 Max. :2593.0
##
## PPG MPG PFG FTPG
## Min. : 0.9091 Min. : 4.00 Min. :0.08333 Min. :0.03448
## 1st Qu.: 5.7958 1st Qu.:17.43 1st Qu.:1.41613 1st Qu.:0.70841
## Median : 9.0833 Median :24.19 Median :1.86301 Median :1.28571
## Mean :10.1268 Mean :23.68 Mean :1.86538 Mean :1.72300
## 3rd Qu.:13.4619 3rd Qu.:30.46 3rd Qu.:2.30488 3rd Qu.:2.28070
## Max. :32.0123 Max. :39.43 Max. :4.41176 Max. :9.20988
##
## X3PPG X2PPG TRBPG ASTPG
## Min. :0.01219 Min. :0.08333 Min. : 0.200 Min. : 0.09091
## 1st Qu.:0.35966 1st Qu.:1.39737 1st Qu.: 2.173 1st Qu.: 1.00000
## Median :0.81250 Median :2.40909 Median : 3.154 Median : 1.80556
## Mean :0.91653 Mean :2.82712 Mean : 3.707 Mean : 2.43068
## 3rd Qu.:1.35135 3rd Qu.:3.87394 3rd Qu.: 4.687 3rd Qu.: 3.22391
## Max. :5.08861 Max. :9.40580 Max. :14.951 Max. :11.65278
##
## STLPG BLKPG TOVPG stayed
## Min. :0.05263 Min. :0.01282 Min. :0.03846 Mode :logical
## 1st Qu.:0.46575 1st Qu.:0.13699 1st Qu.:0.77778 FALSE:716
## Median :0.70588 Median :0.26667 Median :1.20833 TRUE :983
## Mean :0.78949 Mean :0.37601 Mean :1.39515
## 3rd Qu.:1.00000 3rd Qu.:0.45738 3rd Qu.:1.83929
## Max. :2.53333 Max. :3.65151 Max. :5.72839
##
## points_per_min
## Min. :0.1107
## 1st Qu.:0.3206
## Median :0.3956
## Mean :0.4068
## 3rd Qu.:0.4782
## Max. :0.9129
##
##
## Calls:
## m1: lm(formula = I(sqrt(salary_infl)) ~ I(career_year), data = nba_stats_salar_non_zero)
## m2: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed,
## data = nba_stats_salar_non_zero)
## m3: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed +
## FTPG, data = nba_stats_salar_non_zero)
## m4: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed +
## FTPG + X2PPG, data = nba_stats_salar_non_zero)
## m5: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed +
## FTPG + X2PPG + X3PPG, data = nba_stats_salar_non_zero)
## m6: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed +
## FTPG + X2PPG + X3PPG + TRBPG, data = nba_stats_salar_non_zero)
## m7: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed +
## FTPG + X2PPG + X3PPG + TRBPG + ASTPG, data = nba_stats_salar_non_zero)
## m8: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed +
## FTPG + X2PPG + X3PPG + TRBPG + ASTPG + STLPG, data = nba_stats_salar_non_zero)
## m9: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed +
## FTPG + X2PPG + X3PPG + TRBPG + ASTPG + STLPG + log(TOVPG),
## data = nba_stats_salar_non_zero)
## m10: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed +
## FTPG + X2PPG + X3PPG + TRBPG + ASTPG + STLPG + log(TOVPG) +
## sqrt(PFG), data = nba_stats_salar_non_zero)
## m11: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed +
## FTPG + X2PPG + X3PPG + TRBPG + ASTPG + STLPG + log(TOVPG) +
## sqrt(PFG) + BLKPG, data = nba_stats_salar_non_zero)
## m12: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed +
## FTPG + X2PPG + X3PPG + TRBPG + ASTPG + STLPG + log(TOVPG) +
## sqrt(PFG) + BLKPG + TS., data = nba_stats_salar_non_zero)
##
## ==========================================================================================================================================================================================
## m1 m2 m3 m4 m5 m6 m7 m8 m9 m10 m11 m12
## ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
## (Intercept) 1.758*** 1.313*** 0.829*** 0.634*** 0.391*** 0.242*** 0.216*** 0.206*** 0.005 0.484** 0.528*** 0.330
## (0.043) (0.055) (0.047) (0.047) (0.050) (0.051) (0.052) (0.054) (0.072) (0.148) (0.149) (0.240)
## I(career_year) 0.095*** 0.108*** 0.079*** 0.072*** 0.064*** 0.060*** 0.059*** 0.059*** 0.058*** 0.058*** 0.058*** 0.057***
## (0.007) (0.006) (0.005) (0.005) (0.005) (0.005) (0.005) (0.005) (0.005) (0.005) (0.005) (0.005)
## stayed 0.654*** 0.370*** 0.305*** 0.288*** 0.252*** 0.252*** 0.250*** 0.244*** 0.251*** 0.245*** 0.243***
## (0.053) (0.044) (0.042) (0.041) (0.040) (0.040) (0.040) (0.040) (0.039) (0.039) (0.039)
## FTPG 0.462*** 0.222*** 0.106*** 0.115*** 0.102*** 0.101*** 0.105*** 0.092*** 0.092*** 0.089***
## (0.015) (0.023) (0.024) (0.024) (0.024) (0.024) (0.024) (0.024) (0.024) (0.024)
## X2PPG 0.241*** 0.303*** 0.203*** 0.185*** 0.184*** 0.213*** 0.210*** 0.207*** 0.205***
## (0.018) (0.018) (0.021) (0.021) (0.021) (0.022) (0.022) (0.022) (0.022)
## X3PPG 0.346*** 0.373*** 0.357*** 0.353*** 0.375*** 0.389*** 0.400*** 0.387***
## (0.031) (0.031) (0.031) (0.032) (0.032) (0.032) (0.032) (0.035)
## TRBPG 0.117*** 0.127*** 0.125*** 0.133*** 0.153*** 0.137*** 0.137***
## (0.012) (0.013) (0.013) (0.013) (0.014) (0.015) (0.015)
## ASTPG 0.034** 0.029* 0.062*** 0.051** 0.055*** 0.055***
## (0.012) (0.014) (0.016) (0.016) (0.016) (0.016)
## STLPG 0.045 0.081 0.133* 0.140* 0.143*
## (0.061) (0.062) (0.063) (0.063) (0.063)
## log(TOVPG) -0.238*** -0.149* -0.145* -0.135*
## (0.057) (0.062) (0.062) (0.062)
## sqrt(PFG) -0.421*** -0.466*** -0.475***
## (0.114) (0.115) (0.115)
## BLKPG 0.154* 0.146*
## (0.069) (0.070)
## TS. 0.449
## (0.425)
## ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
## R-squared 0.109 0.182 0.474 0.525 0.557 0.580 0.582 0.582 0.586 0.590 0.591 0.591
## adj. R-squared 0.109 0.181 0.473 0.524 0.555 0.579 0.580 0.580 0.584 0.587 0.588 0.588
## sigma 1.118 1.072 0.860 0.817 0.790 0.769 0.768 0.768 0.764 0.761 0.760 0.760
## F 208.044 188.093 509.610 468.253 425.381 389.482 336.353 294.297 266.084 242.660 221.565 203.207
## p 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
## Log-likelihood -2599.968 -2528.025 -2152.074 -2065.638 -2006.949 -1961.198 -1957.206 -1956.933 -1948.199 -1941.304 -1938.819 -1938.258
## Deviance 2122.827 1950.451 1252.955 1131.739 1056.191 1000.813 996.121 995.801 985.615 977.648 974.792 974.149
## AIC 5205.935 5064.050 4314.148 4143.276 4027.898 3938.395 3932.413 3933.866 3918.398 3906.608 3903.638 3904.517
## BIC 5222.249 5085.801 4341.337 4175.903 4065.962 3981.898 3981.353 3988.244 3978.214 3971.861 3974.329 3980.646
## N 1699 1699 1699 1699 1699 1699 1699 1699 1699 1699 1699 1699
## ==========================================================================================================================================================================================
##
## Calls:
## m1: lm(formula = I(sqrt(salary_infl)) ~ I(career_year), data = subset(nba_stats_salar_non_zero,
## Pos == "C"))
## m2: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed,
## data = subset(nba_stats_salar_non_zero, Pos == "C"))
## m3: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed +
## FTPG, data = subset(nba_stats_salar_non_zero, Pos == "C"))
## m4: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed +
## FTPG + X2PPG, data = subset(nba_stats_salar_non_zero, Pos ==
## "C"))
## m5: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed +
## FTPG + X2PPG + X3PPG, data = subset(nba_stats_salar_non_zero,
## Pos == "C"))
## m6: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed +
## FTPG + X2PPG + X3PPG + TRBPG, data = subset(nba_stats_salar_non_zero,
## Pos == "C"))
## m7: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed +
## FTPG + X2PPG + X3PPG + TRBPG + ASTPG, data = subset(nba_stats_salar_non_zero,
## Pos == "C"))
## m8: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed +
## FTPG + X2PPG + X3PPG + TRBPG + ASTPG + STLPG, data = subset(nba_stats_salar_non_zero,
## Pos == "C"))
## m9: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed +
## FTPG + X2PPG + X3PPG + TRBPG + ASTPG + STLPG + log(TOVPG),
## data = subset(nba_stats_salar_non_zero, Pos == "C"))
## m10: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed +
## FTPG + X2PPG + X3PPG + TRBPG + ASTPG + STLPG + log(TOVPG) +
## sqrt(PFG), data = subset(nba_stats_salar_non_zero, Pos ==
## "C"))
## m11: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed +
## FTPG + X2PPG + X3PPG + TRBPG + ASTPG + STLPG + log(TOVPG) +
## sqrt(PFG) + BLKPG, data = subset(nba_stats_salar_non_zero,
## Pos == "C"))
## m12: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed +
## FTPG + X2PPG + X3PPG + TRBPG + ASTPG + STLPG + log(TOVPG) +
## sqrt(PFG) + BLKPG + TS., data = subset(nba_stats_salar_non_zero,
## Pos == "C"))
##
## ============================================================================================================================================================================
## m1 m2 m3 m4 m5 m6 m7 m8 m9 m10 m11 m12
## ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
## (Intercept) 2.317*** 1.950*** 1.132*** 0.631** 0.545* 0.309 0.295 0.318 0.321 0.444 0.611 0.701
## (0.168) (0.254) (0.241) (0.231) (0.236) (0.258) (0.259) (0.260) (0.332) (0.898) (0.899) (1.272)
## I(career_year) 0.076** 0.087*** 0.082*** 0.074*** 0.071*** 0.071*** 0.073*** 0.072*** 0.072*** 0.071*** 0.064** 0.064**
## (0.024) (0.024) (0.021) (0.018) (0.018) (0.018) (0.018) (0.018) (0.019) (0.020) (0.020) (0.020)
## stayed 0.443 0.304 0.164 0.161 0.165 0.170 0.195 0.195 0.194 0.161 0.162
## (0.232) (0.197) (0.177) (0.176) (0.174) (0.175) (0.177) (0.178) (0.179) (0.179) (0.180)
## FTPG 0.449*** 0.106 0.061 0.037 0.041 0.059 0.058 0.059 0.038 0.039
## (0.061) (0.080) (0.084) (0.084) (0.084) (0.087) (0.096) (0.097) (0.097) (0.098)
## X2PPG 0.325*** 0.351*** 0.261*** 0.275*** 0.267*** 0.267*** 0.263** 0.272** 0.272**
## (0.056) (0.058) (0.071) (0.074) (0.075) (0.078) (0.082) (0.081) (0.082)
## X3PPG 0.258 0.290 0.332 0.321 0.320 0.321 0.407* 0.408*
## (0.164) (0.162) (0.176) (0.176) (0.178) (0.178) (0.185) (0.187)
## TRBPG 0.096* 0.098* 0.115* 0.115* 0.118* 0.096 0.096
## (0.044) (0.045) (0.048) (0.051) (0.054) (0.056) (0.056)
## ASTPG -0.059 -0.032 -0.033 -0.035 -0.063 -0.064
## (0.095) (0.099) (0.104) (0.105) (0.106) (0.107)
## STLPG -0.302 -0.302 -0.297 -0.243 -0.243
## (0.324) (0.326) (0.329) (0.329) (0.330)
## log(TOVPG) 0.004 0.025 0.021 0.017
## (0.308) (0.341) (0.339) (0.343)
## sqrt(PFG) -0.084 -0.257 -0.257
## (0.566) (0.573) (0.575)
## BLKPG 0.289 0.291
## (0.182) (0.184)
## TS. -0.167
## (1.670)
## ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
## R-squared 0.070 0.095 0.363 0.496 0.505 0.522 0.524 0.527 0.527 0.527 0.537 0.537
## adj. R-squared 0.063 0.081 0.348 0.480 0.486 0.500 0.498 0.497 0.493 0.489 0.495 0.491
## sigma 1.221 1.209 1.018 0.910 0.904 0.892 0.894 0.895 0.898 0.902 0.896 0.900
## F 9.989 6.912 24.858 31.926 26.333 23.342 19.965 17.560 15.485 13.830 12.957 11.782
## p 0.002 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
## Log-likelihood -217.521 -215.686 -191.993 -176.220 -174.928 -172.517 -172.313 -171.851 -171.850 -171.839 -170.466 -170.460
## Deviance 198.328 193.012 135.875 107.562 105.522 101.820 101.513 100.819 100.819 100.802 98.772 98.764
## AIC 441.041 439.373 393.986 364.440 363.855 361.034 362.627 363.701 365.701 367.677 366.932 368.921
## BIC 449.757 450.994 408.512 381.872 384.192 384.276 388.774 392.754 397.659 402.540 404.700 409.595
## N 135 135 135 135 135 135 135 135 135 135 135 135
## ============================================================================================================================================================================
##
## Calls:
## m1: lm(formula = I(sqrt(salary_infl)) ~ I(career_year), data = subset(nba_stats_salar_non_zero,
## Pos == "PG"))
## m2: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed,
## data = subset(nba_stats_salar_non_zero, Pos == "PG"))
## m3: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed +
## FTPG, data = subset(nba_stats_salar_non_zero, Pos == "PG"))
## m4: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed +
## FTPG + X2PPG, data = subset(nba_stats_salar_non_zero, Pos ==
## "PG"))
## m5: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed +
## FTPG + X2PPG + X3PPG, data = subset(nba_stats_salar_non_zero,
## Pos == "PG"))
## m6: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed +
## FTPG + X2PPG + X3PPG + TRBPG, data = subset(nba_stats_salar_non_zero,
## Pos == "PG"))
## m7: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed +
## FTPG + X2PPG + X3PPG + TRBPG + ASTPG, data = subset(nba_stats_salar_non_zero,
## Pos == "PG"))
## m8: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed +
## FTPG + X2PPG + X3PPG + TRBPG + ASTPG + STLPG, data = subset(nba_stats_salar_non_zero,
## Pos == "PG"))
## m9: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed +
## FTPG + X2PPG + X3PPG + TRBPG + ASTPG + STLPG + log(TOVPG),
## data = subset(nba_stats_salar_non_zero, Pos == "PG"))
## m10: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed +
## FTPG + X2PPG + X3PPG + TRBPG + ASTPG + STLPG + log(TOVPG) +
## sqrt(PFG), data = subset(nba_stats_salar_non_zero, Pos ==
## "PG"))
## m11: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed +
## FTPG + X2PPG + X3PPG + TRBPG + ASTPG + STLPG + log(TOVPG) +
## sqrt(PFG) + BLKPG, data = subset(nba_stats_salar_non_zero,
## Pos == "PG"))
## m12: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed +
## FTPG + X2PPG + X3PPG + TRBPG + ASTPG + STLPG + log(TOVPG) +
## sqrt(PFG) + BLKPG + TS., data = subset(nba_stats_salar_non_zero,
## Pos == "PG"))
##
## ==============================================================================================================================================================================
## m1 m2 m3 m4 m5 m6 m7 m8 m9 m10 m11 m12
## ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
## (Intercept) 1.767*** 1.208*** 0.599*** 0.374*** 0.165 0.079 0.069 0.068 -0.285* 0.031 0.058 -0.715
## (0.092) (0.109) (0.092) (0.097) (0.099) (0.108) (0.106) (0.110) (0.130) (0.287) (0.288) (0.517)
## I(career_year) 0.069*** 0.088*** 0.067*** 0.064*** 0.056*** 0.056*** 0.047*** 0.047*** 0.046*** 0.048*** 0.049*** 0.044***
## (0.014) (0.014) (0.011) (0.010) (0.010) (0.010) (0.010) (0.010) (0.010) (0.010) (0.010) (0.010)
## stayed 0.879*** 0.459*** 0.383*** 0.347*** 0.348*** 0.356*** 0.356*** 0.318*** 0.324*** 0.319*** 0.318***
## (0.108) (0.087) (0.085) (0.081) (0.081) (0.080) (0.081) (0.079) (0.079) (0.080) (0.079)
## FTPG 0.489*** 0.286*** 0.167*** 0.148** 0.142** 0.142** 0.146** 0.140** 0.136** 0.121*
## (0.029) (0.046) (0.047) (0.048) (0.048) (0.048) (0.047) (0.047) (0.047) (0.048)
## X2PPG 0.232*** 0.262*** 0.231*** 0.181*** 0.181*** 0.263*** 0.258*** 0.252*** 0.241***
## (0.041) (0.040) (0.043) (0.044) (0.044) (0.046) (0.047) (0.047) (0.047)
## X3PPG 0.413*** 0.393*** 0.367*** 0.366*** 0.422*** 0.435*** 0.447*** 0.391***
## (0.065) (0.065) (0.065) (0.065) (0.065) (0.065) (0.066) (0.073)
## TRBPG 0.083 0.001 0.001 0.027 0.042 0.029 0.041
## (0.043) (0.047) (0.050) (0.049) (0.050) (0.052) (0.052)
## ASTPG 0.102*** 0.102*** 0.175*** 0.163*** 0.171*** 0.173***
## (0.028) (0.030) (0.033) (0.034) (0.035) (0.035)
## STLPG 0.005 0.062 0.121 0.114 0.089
## (0.121) (0.119) (0.128) (0.128) (0.129)
## log(TOVPG) -0.755*** -0.674*** -0.683*** -0.655***
## (0.158) (0.171) (0.172) (0.172)
## sqrt(PFG) -0.299 -0.344 -0.351
## (0.242) (0.246) (0.245)
## BLKPG 0.295 0.295
## (0.293) (0.292)
## TS. 1.727
## (0.959)
## ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
## R-squared 0.052 0.183 0.514 0.548 0.589 0.592 0.605 0.605 0.626 0.628 0.629 0.632
## adj. R-squared 0.050 0.179 0.510 0.544 0.584 0.586 0.599 0.598 0.618 0.619 0.619 0.621
## sigma 1.168 1.086 0.839 0.810 0.773 0.771 0.759 0.760 0.741 0.740 0.740 0.738
## F 22.962 46.437 146.031 125.437 118.173 99.767 90.110 78.655 76.148 68.774 62.617 57.985
## p 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
## Log-likelihood -658.599 -627.611 -518.865 -503.512 -483.758 -481.848 -474.977 -474.976 -463.642 -462.861 -462.338 -460.671
## Deviance 568.877 490.660 291.978 271.345 246.929 244.689 236.793 236.792 224.322 223.487 222.930 221.163
## AIC 1323.198 1263.223 1047.730 1019.024 981.516 979.697 967.953 969.951 949.285 949.721 950.675 949.342
## BIC 1335.311 1279.374 1067.920 1043.251 1009.781 1012.000 1004.294 1010.330 993.701 998.176 1003.167 1005.872
## N 419 419 419 419 419 419 419 419 419 419 419 419
## ==============================================================================================================================================================================
##
## Calls:
## m1: lm(formula = I(sqrt(salary_infl)) ~ I(career_year), data = subset(nba_stats_salar_non_zero,
## Pos == "SG"))
## m2: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed,
## data = subset(nba_stats_salar_non_zero, Pos == "SG"))
## m3: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed +
## FTPG, data = subset(nba_stats_salar_non_zero, Pos == "SG"))
## m4: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed +
## FTPG + X2PPG, data = subset(nba_stats_salar_non_zero, Pos ==
## "SG"))
## m5: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed +
## FTPG + X2PPG + X3PPG, data = subset(nba_stats_salar_non_zero,
## Pos == "SG"))
## m6: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed +
## FTPG + X2PPG + X3PPG + TRBPG, data = subset(nba_stats_salar_non_zero,
## Pos == "SG"))
## m7: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed +
## FTPG + X2PPG + X3PPG + TRBPG + ASTPG, data = subset(nba_stats_salar_non_zero,
## Pos == "SG"))
## m8: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed +
## FTPG + X2PPG + X3PPG + TRBPG + ASTPG + STLPG, data = subset(nba_stats_salar_non_zero,
## Pos == "SG"))
## m9: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed +
## FTPG + X2PPG + X3PPG + TRBPG + ASTPG + STLPG + log(TOVPG),
## data = subset(nba_stats_salar_non_zero, Pos == "SG"))
## m10: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed +
## FTPG + X2PPG + X3PPG + TRBPG + ASTPG + STLPG + log(TOVPG) +
## sqrt(PFG), data = subset(nba_stats_salar_non_zero, Pos ==
## "SG"))
## m11: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed +
## FTPG + X2PPG + X3PPG + TRBPG + ASTPG + STLPG + log(TOVPG) +
## sqrt(PFG) + BLKPG, data = subset(nba_stats_salar_non_zero,
## Pos == "SG"))
## m12: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed +
## FTPG + X2PPG + X3PPG + TRBPG + ASTPG + STLPG + log(TOVPG) +
## sqrt(PFG) + BLKPG + TS., data = subset(nba_stats_salar_non_zero,
## Pos == "SG"))
##
## ==============================================================================================================================================================================
## m1 m2 m3 m4 m5 m6 m7 m8 m9 m10 m11 m12
## ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
## (Intercept) 1.636*** 1.367*** 0.996*** 0.745*** 0.506*** 0.378*** 0.367*** 0.365*** 0.121 0.690* 0.699* 0.652
## (0.074) (0.098) (0.083) (0.084) (0.091) (0.102) (0.101) (0.103) (0.155) (0.283) (0.284) (0.496)
## I(career_year) 0.092*** 0.100*** 0.061*** 0.059*** 0.052*** 0.051*** 0.046*** 0.046*** 0.046*** 0.044*** 0.044*** 0.044***
## (0.012) (0.012) (0.010) (0.009) (0.009) (0.009) (0.009) (0.009) (0.009) (0.009) (0.009) (0.009)
## stayed 0.401*** 0.137 0.110 0.105 0.085 0.085 0.085 0.089 0.084 0.082 0.082
## (0.098) (0.081) (0.076) (0.073) (0.073) (0.072) (0.072) (0.072) (0.072) (0.072) (0.072)
## FTPG 0.430*** 0.159*** 0.104* 0.100* 0.079 0.079 0.079 0.063 0.063 0.062
## (0.028) (0.044) (0.043) (0.043) (0.043) (0.043) (0.043) (0.043) (0.043) (0.044)
## X2PPG 0.289*** 0.300*** 0.249*** 0.217*** 0.216*** 0.240*** 0.243*** 0.244*** 0.244***
## (0.037) (0.036) (0.040) (0.042) (0.042) (0.044) (0.044) (0.044) (0.044)
## X3PPG 0.309*** 0.288*** 0.291*** 0.291*** 0.323*** 0.357*** 0.358*** 0.355***
## (0.054) (0.054) (0.053) (0.053) (0.055) (0.057) (0.057) (0.065)
## TRBPG 0.114** 0.087* 0.084 0.099* 0.131** 0.125* 0.125*
## (0.042) (0.043) (0.047) (0.047) (0.049) (0.051) (0.051)
## ASTPG 0.102** 0.101** 0.151*** 0.141** 0.142** 0.142**
## (0.037) (0.037) (0.044) (0.044) (0.044) (0.044)
## STLPG 0.021 0.048 0.121 0.107 0.108
## (0.137) (0.137) (0.140) (0.144) (0.144)
## log(TOVPG) -0.237* -0.147 -0.145 -0.144
## (0.114) (0.120) (0.120) (0.121)
## sqrt(PFG) -0.552* -0.559* -0.557*
## (0.230) (0.231) (0.232)
## BLKPG 0.095 0.095
## (0.247) (0.247)
## TS. 0.099
## (0.850)
## ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
## R-squared 0.127 0.159 0.453 0.519 0.553 0.561 0.568 0.568 0.573 0.578 0.579 0.579
## adj. R-squared 0.125 0.155 0.449 0.515 0.548 0.555 0.562 0.561 0.564 0.569 0.568 0.567
## sigma 1.017 0.999 0.807 0.757 0.731 0.726 0.720 0.721 0.718 0.714 0.715 0.716
## F 63.925 41.541 120.963 118.016 107.929 92.543 81.673 71.306 64.347 59.129 53.661 49.077
## p 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
## Log-likelihood -633.739 -625.413 -530.344 -501.839 -485.713 -481.919 -477.989 -477.977 -475.781 -472.842 -472.766 -472.759
## Deviance 455.333 438.498 285.197 250.685 233.044 229.078 225.040 225.028 222.803 219.860 219.784 219.777
## AIC 1273.478 1258.826 1070.688 1015.679 985.425 979.838 973.977 975.954 973.562 969.684 971.532 973.518
## BIC 1285.752 1275.191 1091.145 1040.227 1014.064 1012.569 1010.799 1016.867 1018.566 1018.780 1024.719 1030.796
## N 442 442 442 442 442 442 442 442 442 442 442 442
## ==============================================================================================================================================================================
##
## Calls:
## m1: lm(formula = I(sqrt(salary_infl)) ~ I(career_year), data = subset(nba_stats_salar_non_zero,
## Pos == "PF"))
## m2: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed,
## data = subset(nba_stats_salar_non_zero, Pos == "PF"))
## m3: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed +
## FTPG, data = subset(nba_stats_salar_non_zero, Pos == "PF"))
## m4: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed +
## FTPG + X2PPG, data = subset(nba_stats_salar_non_zero, Pos ==
## "PF"))
## m5: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed +
## FTPG + X2PPG + X3PPG, data = subset(nba_stats_salar_non_zero,
## Pos == "PF"))
## m6: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed +
## FTPG + X2PPG + X3PPG + TRBPG, data = subset(nba_stats_salar_non_zero,
## Pos == "PF"))
## m7: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed +
## FTPG + X2PPG + X3PPG + TRBPG + ASTPG, data = subset(nba_stats_salar_non_zero,
## Pos == "PF"))
## m8: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed +
## FTPG + X2PPG + X3PPG + TRBPG + ASTPG + STLPG, data = subset(nba_stats_salar_non_zero,
## Pos == "PF"))
## m9: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed +
## FTPG + X2PPG + X3PPG + TRBPG + ASTPG + STLPG + log(TOVPG),
## data = subset(nba_stats_salar_non_zero, Pos == "PF"))
## m10: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed +
## FTPG + X2PPG + X3PPG + TRBPG + ASTPG + STLPG + log(TOVPG) +
## sqrt(PFG), data = subset(nba_stats_salar_non_zero, Pos ==
## "PF"))
## m11: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed +
## FTPG + X2PPG + X3PPG + TRBPG + ASTPG + STLPG + log(TOVPG) +
## sqrt(PFG) + BLKPG, data = subset(nba_stats_salar_non_zero,
## Pos == "PF"))
## m12: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed +
## FTPG + X2PPG + X3PPG + TRBPG + ASTPG + STLPG + log(TOVPG) +
## sqrt(PFG) + BLKPG + TS., data = subset(nba_stats_salar_non_zero,
## Pos == "PF"))
##
## ==============================================================================================================================================================================
## m1 m2 m3 m4 m5 m6 m7 m8 m9 m10 m11 m12
## ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
## (Intercept) 1.744*** 1.328*** 0.935*** 0.857*** 0.538*** 0.343** 0.347** 0.314* 0.392* 0.714 0.750* 0.006
## (0.098) (0.126) (0.105) (0.108) (0.115) (0.126) (0.124) (0.128) (0.181) (0.375) (0.380) (0.571)
## I(career_year) 0.129*** 0.136*** 0.096*** 0.090*** 0.076*** 0.074*** 0.069*** 0.069*** 0.070*** 0.070*** 0.071*** 0.069***
## (0.015) (0.014) (0.012) (0.012) (0.011) (0.011) (0.011) (0.011) (0.011) (0.011) (0.011) (0.011)
## stayed 0.596*** 0.321** 0.303** 0.289** 0.253** 0.240** 0.242** 0.244** 0.254** 0.251** 0.230*
## (0.119) (0.097) (0.097) (0.091) (0.090) (0.089) (0.089) (0.089) (0.090) (0.090) (0.091)
## FTPG 0.500*** 0.369*** 0.157* 0.112 0.072 0.083 0.086 0.072 0.082 0.078
## (0.038) (0.062) (0.068) (0.068) (0.069) (0.069) (0.070) (0.071) (0.073) (0.073)
## X2PPG 0.104** 0.238*** 0.156** 0.136** 0.125* 0.116* 0.116* 0.110* 0.098
## (0.039) (0.043) (0.048) (0.048) (0.049) (0.051) (0.051) (0.053) (0.053)
## X3PPG 0.484*** 0.439*** 0.424*** 0.416*** 0.411*** 0.415*** 0.412*** 0.373***
## (0.080) (0.079) (0.078) (0.079) (0.079) (0.079) (0.080) (0.083)
## TRBPG 0.112*** 0.102** 0.098** 0.092** 0.100** 0.095** 0.104**
## (0.032) (0.032) (0.032) (0.034) (0.035) (0.036) (0.036)
## ASTPG 0.155** 0.131* 0.121* 0.113 0.112 0.098
## (0.050) (0.055) (0.058) (0.058) (0.058) (0.059)
## STLPG 0.169 0.151 0.186 0.193 0.206
## (0.167) (0.170) (0.173) (0.174) (0.173)
## log(TOVPG) 0.077 0.140 0.145 0.185
## (0.125) (0.140) (0.141) (0.142)
## sqrt(PFG) -0.257 -0.290 -0.354
## (0.262) (0.267) (0.269)
## BLKPG 0.069 0.046
## (0.110) (0.111)
## TS. 1.693
## (0.974)
## ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
## R-squared 0.203 0.264 0.531 0.542 0.592 0.608 0.620 0.622 0.622 0.623 0.624 0.628
## adj. R-squared 0.200 0.259 0.526 0.536 0.585 0.600 0.611 0.611 0.610 0.610 0.610 0.612
## sigma 1.036 0.997 0.797 0.790 0.746 0.733 0.722 0.722 0.723 0.723 0.724 0.721
## F 76.990 54.109 113.590 88.696 86.759 77.014 69.294 60.763 53.941 48.637 44.160 41.011
## p 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
## Log-likelihood -442.677 -430.499 -361.743 -358.174 -340.500 -334.413 -329.555 -329.031 -328.834 -328.336 -328.131 -326.562
## Deviance 325.457 300.480 191.430 187.002 166.538 160.022 155.004 154.472 154.273 153.770 153.563 151.992
## AIC 891.353 868.998 733.487 728.348 695.000 684.827 677.110 678.061 679.669 680.672 682.261 681.124
## BIC 902.514 883.880 752.089 750.670 721.042 714.589 710.592 715.264 720.592 725.316 730.626 733.209
## N 305 305 305 305 305 305 305 305 305 305 305 305
## ==============================================================================================================================================================================
##
## Calls:
## m1: lm(formula = I(sqrt(salary_infl)) ~ I(career_year), data = subset(nba_stats_salar_non_zero,
## Pos == "SF"))
## m2: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed,
## data = subset(nba_stats_salar_non_zero, Pos == "SF"))
## m3: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed +
## FTPG, data = subset(nba_stats_salar_non_zero, Pos == "SF"))
## m4: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed +
## FTPG + X2PPG, data = subset(nba_stats_salar_non_zero, Pos ==
## "SF"))
## m5: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed +
## FTPG + X2PPG + X3PPG, data = subset(nba_stats_salar_non_zero,
## Pos == "SF"))
## m6: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed +
## FTPG + X2PPG + X3PPG + TRBPG, data = subset(nba_stats_salar_non_zero,
## Pos == "SF"))
## m7: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed +
## FTPG + X2PPG + X3PPG + TRBPG + ASTPG, data = subset(nba_stats_salar_non_zero,
## Pos == "SF"))
## m8: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed +
## FTPG + X2PPG + X3PPG + TRBPG + ASTPG + STLPG, data = subset(nba_stats_salar_non_zero,
## Pos == "SF"))
## m9: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed +
## FTPG + X2PPG + X3PPG + TRBPG + ASTPG + STLPG + log(TOVPG),
## data = subset(nba_stats_salar_non_zero, Pos == "SF"))
## m10: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed +
## FTPG + X2PPG + X3PPG + TRBPG + ASTPG + STLPG + log(TOVPG) +
## sqrt(PFG), data = subset(nba_stats_salar_non_zero, Pos ==
## "SF"))
## m11: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed +
## FTPG + X2PPG + X3PPG + TRBPG + ASTPG + STLPG + log(TOVPG) +
## sqrt(PFG) + BLKPG, data = subset(nba_stats_salar_non_zero,
## Pos == "SF"))
## m12: lm(formula = I(sqrt(salary_infl)) ~ I(career_year) + stayed +
## FTPG + X2PPG + X3PPG + TRBPG + ASTPG + STLPG + log(TOVPG) +
## sqrt(PFG) + BLKPG + TS., data = subset(nba_stats_salar_non_zero,
## Pos == "SF"))
##
## ==============================================================================================================================================================================
## m1 m2 m3 m4 m5 m6 m7 m8 m9 m10 m11 m12
## ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
## (Intercept) 1.738*** 1.285*** 0.800*** 0.648*** 0.398*** 0.263* 0.318** 0.303** 0.117 0.974** 0.971** 1.865***
## (0.090) (0.115) (0.095) (0.097) (0.104) (0.113) (0.112) (0.113) (0.160) (0.326) (0.327) (0.520)
## I(career_year) 0.097*** 0.113*** 0.085*** 0.080*** 0.066*** 0.063*** 0.054*** 0.054*** 0.054*** 0.053*** 0.053*** 0.052***
## (0.014) (0.013) (0.011) (0.010) (0.010) (0.010) (0.010) (0.010) (0.010) (0.010) (0.010) (0.010)
## stayed 0.669*** 0.427*** 0.395*** 0.347*** 0.318*** 0.295*** 0.283*** 0.285*** 0.291*** 0.292*** 0.291***
## (0.112) (0.089) (0.087) (0.084) (0.084) (0.083) (0.083) (0.083) (0.082) (0.082) (0.082)
## FTPG 0.478*** 0.242*** 0.130* 0.133* 0.099 0.102 0.103 0.081 0.082 0.097
## (0.030) (0.056) (0.057) (0.057) (0.057) (0.057) (0.056) (0.056) (0.057) (0.057)
## X2PPG 0.233*** 0.273*** 0.194*** 0.142** 0.142** 0.163** 0.166** 0.166** 0.173**
## (0.047) (0.046) (0.053) (0.054) (0.054) (0.056) (0.055) (0.055) (0.055)
## X3PPG 0.432*** 0.377*** 0.344*** 0.327*** 0.346*** 0.398*** 0.397*** 0.474***
## (0.077) (0.078) (0.078) (0.078) (0.079) (0.080) (0.081) (0.087)
## TRBPG 0.110** 0.080* 0.051 0.062 0.076 0.078 0.082
## (0.039) (0.039) (0.043) (0.043) (0.043) (0.045) (0.045)
## ASTPG 0.194*** 0.180*** 0.207*** 0.163** 0.163** 0.164**
## (0.052) (0.053) (0.055) (0.057) (0.057) (0.056)
## STLPG 0.208 0.238 0.347* 0.348* 0.341*
## (0.132) (0.133) (0.136) (0.137) (0.136)
## log(TOVPG) -0.177 -0.014 -0.015 -0.053
## (0.108) (0.120) (0.120) (0.121)
## sqrt(PFG) -0.724** -0.720** -0.712**
## (0.241) (0.243) (0.242)
## BLKPG -0.026 -0.026
## (0.180) (0.179)
## TS. -1.941*
## (0.880)
## ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
## R-squared 0.113 0.186 0.504 0.533 0.568 0.577 0.592 0.594 0.597 0.606 0.606 0.611
## adj. R-squared 0.111 0.182 0.500 0.529 0.563 0.571 0.584 0.586 0.588 0.596 0.595 0.599
## sigma 1.139 1.092 0.854 0.829 0.798 0.791 0.778 0.777 0.775 0.767 0.768 0.765
## F 50.443 45.110 133.522 112.318 103.207 88.929 80.747 71.233 63.894 59.592 54.039 50.436
## p 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
## Log-likelihood -615.447 -598.369 -499.719 -487.611 -472.143 -468.042 -461.043 -459.774 -458.398 -453.818 -453.808 -451.312
## Deviance 513.513 471.282 287.070 270.124 249.923 244.826 236.364 234.862 233.243 227.937 227.925 225.084
## AIC 1236.895 1204.739 1009.438 987.222 958.286 952.085 940.085 939.549 938.796 931.636 933.615 930.624
## BIC 1248.854 1220.685 1029.371 1011.141 986.191 983.976 975.963 979.413 982.647 979.474 985.439 986.434
## N 398 398 398 398 398 398 398 398 398 398 398 398
## ==============================================================================================================================================================================
The modell explains only just about 59 % of the salary variable indicated by R^2 of 0.59. There are seem to be factors explaining the salary level other than players performance on the field. The modell works best for power forwards and point guards, the respective R^2 of 0.628 and 0.632 are the largest compared to other positions. The modell for center players has the lowest R^2 with 0.537. First, by subsetting the data and setting per game performance statistic larger than zero, we are left with only 135 center players out of 438 in the original data frame since many of them might have not scored a single 2 or 3 points goal for example. Their performance is better measured by rebounds or blocks, there could be some other indicators of their performance not analyzed here. The modell is obviously not optimal, but still is able to show that players perfomance in the game is related to their salary level although there are obviously some other factors involved such as salary cap and negotiation ability which may be connected to players salary.
As you can see from last plots by position in the multivariate analysis section, some game performance indicators show a more or less linear relationships with salary variable, e.g. points per game variable. Some of them show less linear and less stronger relationships or no relationship at all e.g. 3 points per game for center players. Different variables work differently for different players. So I tried to create a linear modell using most of the variables explored above and then test it for different players positions.
Ranges of variables vary across positions. For example the range of assits per game for centers is about one and a half times as small as for point guards. This makes sense since point guards are those who handle the ball most and initiate the offense as opposed to center players. This range difference can be also seen for some other variables across different positions.
Yes, I created a linear modell which tries to explain players salary through variables of his game performance. These variables describe shooting perfomance, defence performance and so on. Since different positions have different focus (offence, defence, assists, rebounding etc.) for every position different variables work differently and have different R^2 outcomes. The modell for all players has an R^2 of about 59%, so the choosen game performance indicators and other modell variables explain more than a half of players salary. Limitations of the modell is that it does not take into account other variables which may be important for the salary.
The distribution of NBA players salaries appears to have a bimodal shape on a square root scale. The distribution is skewed to the right with only a small number of players earning over 20 milllion USD.
Salaries vary across positions and along with the career progress. Center players and power forwards earn most measured by the interquartile range, but at the same time salaries of players in other positions show much more outliers at the top. Salaries of younger players in their first to third year tend to be much smaller compared to those with more advanced careers.
Players salary (transformed using square root function) and diverse game performance indicators appear to be related to each other. Here an example of the relationship between salary and points per game is shown. Different performance metrics showed different relationships to players salary across different positions. I used some of the per game performance indicators such as 2 and 3 point goals, free throws, blocks, assists, rebounds and steals along with career year and ‘stayed’ variable to build a linear regression model. The stayed variable represents the fact whether or not the player changed the team within or after the season. The modell explains about 60% of players salary (based on all players) and deliver slightly different results across positions with the highest R² of approx. 0.63 for point guards and power forwards. The modell fails to explain the rest of salary since there might be other factors influencing the salary ranging from salary cap rules to player’s agent’s ability to negotiate.
The NBA salaries and performance dataset contains salaries of NBA players over the last 10 years along with their average game performance metrics and some other players charateristics such as age, weight and height. I started by understanding choosen variables in the dataset using univariate plots as well as how variables are connected to each other using bivariate and multivariate plots. I then tried to build a modell explaining NBA salaries through player’s performance and experience.
I went through some struggles trying to understand how variables in the dataset are related to each other and influenced by each other with an example of the position variable influencing one or another performance indicator. Another challenge was to understand the structure of different datasets I used as input in the beginning to pull together the final dataset used for analysis here. As I was expecting the salary is connected to players performance. At the same time performance indicators vary across positions. It was a surprise to me that players who stay with their teams earn more on average than those who change teams.
Future work which can be done using the dataset is exploring other variables in the dataset, trying to build a better modell which could explain more than just 60% of the salary by finding and incorporating new variables.